A la hora de recopilar datos de libre acceso, alojados en sitios web de entidades y/o instituciones gubernamentales, es deseable obtenerlos en un "formato consumible" (xlsx,csv,sav, etc.) por el usuario, pero no siempre es posible. Para superar este inconveniente de manera eficiente, existen técnicas como el Web Scraping, que básicamente consiste en explorar el código fuente, identificar y extraer la información que sea considerada relevante para los fines deseados. Se dice que es eficiente porque una vez generado el script, es un proceso automatizado y replicable. En este caso, utilizando Python se desea obtener los datos de Covid-19 a nivel de países de todo el mundo.
Algunas estrategias de Web Scraping requieren de algún complemento o software adicional. Para este caso solo se utilza los recursos de dos paquetes de Python. Esto puede variar dependiendo de las características y la estructura de las páginas web.
Para lograr el objetivo, se requiere lo siguiente:
urllib, los módulos Request y urlopen. Para abrir, leer y analizar url. bs4, Para extraer los datos del archivo HTML y XML.Eventualmente los paquetes ya conocidos: pandas, numpy, datetime, matplotliby plotly. Para procesar la Data, definir formato y visualizar mediante gráficos.
Nota. Otras fuentes para obetener datos de Covid-19: Healthdata, Ourworldindata, WorldHealthOrganization.
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
from datetime import date, datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as py
import seaborn as sns
import gc
import warnings
warnings.filterwarnings("ignore")
Los reportes de Worldometer se actualizan a las 00:00 horas de la zona horaria GMT+0. Es decir, a las 19:00 horas de Perú (GMT-5), por lo que los registros de nuevos casos después de dicha hora permanecen vacíos para algunos países, incluyendo los de América. Para obtener la última Data completa de todos los países, es conveniente seleccionar la información del día anterior (En Perú, Data antes de 19:00 hrs).


today = datetime.now()
print(today)
yesterday_str = "%d %s, %d" %(today.day-1, date.today().strftime("%b"), today.year)
yesterday_str
2021-09-18 22:43:23.498026
'17 Sep, 2021'
En el siguiente apartado se realiza la consulta con la línea req = Request(url , headers..., que tiene como argumento la dirección web anteriormente almacenada en la variable url y headersque identifica al navegador. Posteriormente la consulta es aperturada y almacenada en webpage = urlopen(req). Finalmente se visualiza una parte de la estructura de la página web, que previamente fue analizada con BeautifulSoup (sup) y almacenada en page_soup.
url = "https://www.worldometers.info/coronavirus/#countries"
req = Request(url , headers ={'User-Agent': "Chrome/92.0.4515.159"})
webpage = urlopen(req)
print(webpage)
page_soup = soup(webpage, "html.parser")
page_soup.head()
<http.client.HTTPResponse object at 0x0000024B501A5EE0>
[<meta charset="utf-8"/>,
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>,
<meta content="width=device-width, initial-scale=1" name="viewport"/>,
<title>COVID Live Update: 228,944,788 Cases and 4,700,198 Deaths from the Coronavirus - Worldometer</title>,
<meta content="Live statistics and coronavirus news tracking the number of confirmed cases, recovered patients, tests, and death toll due to the COVID-19 coronavirus from Wuhan, China. Coronavirus counter with new cases, deaths, and number of tests per 1 Million population. Historical data and info. Daily charts, graphs, news and updates" name="description"/>,
<link href="/favicon/favicon.ico" rel="shortcut icon" type="image/x-icon"/>,
<link href="/favicon/apple-icon-57x57.png" rel="apple-touch-icon" sizes="57x57"/>,
<link href="/favicon/apple-icon-60x60.png" rel="apple-touch-icon" sizes="60x60"/>,
<link href="/favicon/apple-icon-72x72.png" rel="apple-touch-icon" sizes="72x72"/>,
<link href="/favicon/apple-icon-76x76.png" rel="apple-touch-icon" sizes="76x76"/>,
<link href="/favicon/apple-icon-114x114.png" rel="apple-touch-icon" sizes="114x114"/>,
<link href="/favicon/apple-icon-120x120.png" rel="apple-touch-icon" sizes="120x120"/>,
<link href="/favicon/apple-icon-144x144.png" rel="apple-touch-icon" sizes="144x144"/>,
<link href="/favicon/apple-icon-152x152.png" rel="apple-touch-icon" sizes="152x152"/>,
<link href="/favicon/apple-icon-180x180.png" rel="apple-touch-icon" sizes="180x180"/>,
<link href="/favicon/android-icon-192x192.png" rel="icon" sizes="192x192" type="image/png"/>,
<link href="/favicon/favicon-32x32.png" rel="icon" sizes="32x32" type="image/png"/>,
<link href="/favicon/favicon-96x96.png" rel="icon" sizes="96x96" type="image/png"/>,
<link href="/favicon/favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/>,
<link href="/favicon/manifest.json" rel="manifest"/>,
<meta content="#ffffff" name="msapplication-TileColor"/>,
<meta content="/favicon/ms-icon-144x144.png" name="msapplication-TileImage"/>,
<meta content="#ffffff" name="theme-color"/>,
<meta content="http://www.worldometers.info/img/worldometers-fb.jpg" property="og:image">
<link href="/css/bootstrap.min.css" rel="stylesheet"/>
<link href="/wm16.css" rel="stylesheet"/>
<link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css" rel="stylesheet"/>
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<script src="/js/jquery.min.js"></script>
<script src="/js/bootstrap.min.js"></script>
<script src="/js/ie10-viewport-bug-workaround.js"></script>
<link href="https://cdn.datatables.net/1.10.19/css/dataTables.bootstrap.min.css" rel="stylesheet" type="text/css">
<script src="https://cdn.datatables.net/1.10.19/js/jquery.dataTables.min.js" type="text/javascript"></script>
<script src="https://cdn.datatables.net/1.10.19/js/dataTables.bootstrap.min.js" type="text/javascript"></script>
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#example2').dataTable( {
"scrollCollapse": true,
"sDom": '<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#table3').dataTable( {
"scrollCollapse": true,
"order": [[ 1, 'desc' ]],
"sDom": '<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#example').dataTable( {
"scrollCollapse": true,
"searching": false,
"sDom": '<"top">rt<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>
<script class="init" type="text/javascript">
$(document).ready(function() {
$('#popbycountry').dataTable();
} );
</script>
<script data-cfasync="false" type="text/javascript">
var freestar = freestar || {};
freestar.hitTime = Date.now();
freestar.queue = freestar.queue || [];
freestar.config = freestar.config || {};
freestar.debug = window.location.search.indexOf('fsdebug') === -1 ? false : true;
freestar.config.enabled_slots = [];
!function(a,b){var c=b.getElementsByTagName("script")[0],d=b.createElement("script"),e="https://a.pub.network/worldometers-info";e+=freestar.debug?"/qa/pubfig.min.js":"/pubfig.min.js",d.async=!0,d.src=e,c.parentNode.insertBefore(d,c)}(window,document);
</script>
</link></meta>,
<link href="/css/bootstrap.min.css" rel="stylesheet"/>,
<link href="/wm16.css" rel="stylesheet"/>,
<link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css" rel="stylesheet"/>,
<script src="/js/jquery.min.js"></script>,
<script src="/js/bootstrap.min.js"></script>,
<script src="/js/ie10-viewport-bug-workaround.js"></script>,
<link href="https://cdn.datatables.net/1.10.19/css/dataTables.bootstrap.min.css" rel="stylesheet" type="text/css">
<script src="https://cdn.datatables.net/1.10.19/js/jquery.dataTables.min.js" type="text/javascript"></script>
<script src="https://cdn.datatables.net/1.10.19/js/dataTables.bootstrap.min.js" type="text/javascript"></script>
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#example2').dataTable( {
"scrollCollapse": true,
"sDom": '<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#table3').dataTable( {
"scrollCollapse": true,
"order": [[ 1, 'desc' ]],
"sDom": '<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#example').dataTable( {
"scrollCollapse": true,
"searching": false,
"sDom": '<"top">rt<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>
<script class="init" type="text/javascript">
$(document).ready(function() {
$('#popbycountry').dataTable();
} );
</script>
<script data-cfasync="false" type="text/javascript">
var freestar = freestar || {};
freestar.hitTime = Date.now();
freestar.queue = freestar.queue || [];
freestar.config = freestar.config || {};
freestar.debug = window.location.search.indexOf('fsdebug') === -1 ? false : true;
freestar.config.enabled_slots = [];
!function(a,b){var c=b.getElementsByTagName("script")[0],d=b.createElement("script"),e="https://a.pub.network/worldometers-info";e+=freestar.debug?"/qa/pubfig.min.js":"/pubfig.min.js",d.async=!0,d.src=e,c.parentNode.insertBefore(d,c)}(window,document);
</script>
</link>,
<script src="https://cdn.datatables.net/1.10.19/js/jquery.dataTables.min.js" type="text/javascript"></script>,
<script src="https://cdn.datatables.net/1.10.19/js/dataTables.bootstrap.min.js" type="text/javascript"></script>,
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#example2').dataTable( {
"scrollCollapse": true,
"sDom": '<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>,
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#table3').dataTable( {
"scrollCollapse": true,
"order": [[ 1, 'desc' ]],
"sDom": '<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>,
<script class="init" type="text/javascript">
$.extend( $.fn.dataTable.defaults, {
responsive: true
} );
$(document).ready(function() {
$('#example').dataTable( {
"scrollCollapse": true,
"searching": false,
"sDom": '<"top">rt<"bottom"flp><"clear">',
"paging": false
} );
} );
</script>,
<script class="init" type="text/javascript">
$(document).ready(function() {
$('#popbycountry').dataTable();
} );
</script>,
<script data-cfasync="false" type="text/javascript">
var freestar = freestar || {};
freestar.hitTime = Date.now();
freestar.queue = freestar.queue || [];
freestar.config = freestar.config || {};
freestar.debug = window.location.search.indexOf('fsdebug') === -1 ? false : true;
freestar.config.enabled_slots = [];
!function(a,b){var c=b.getElementsByTagName("script")[0],d=b.createElement("script"),e="https://a.pub.network/worldometers-info";e+=freestar.debug?"/qa/pubfig.min.js":"/pubfig.min.js",d.async=!0,d.src=e,c.parentNode.insertBefore(d,c)}(window,document);
</script>]
A lo largo de código almacenado en page_soup, mediante el atributo table e identificador id se localiza la tabla que contiene la información a extraer, en este caso main_table_countries_yesterday. En adelante, en vista de que cada fila corresponde a un país, con findAll y el atributo td se establece una secuencia de instrucciones o loop para reemplazar los valores no existentes o no deseados, y así únicamente extraer los datos de nuestro interés, los cuales se adjuntan dentro del elemento all_data
table = page_soup.findAll("table",{"id":"main_table_countries_yesterday"})
containers = table[0].findAll("tr",{"style":""})
title = containers[0]
del containers[0]
all_data =[]
clean = True
for country in containers:
country_data = []
country_container = country.findAll("td")
if country_container[1].text =="China":
continue
for i in range(1, len(country_container)):
final_feature = country_container[i].text
if clean:
if i != 1 and i != len(country_container)-1:
final_feature = final_feature.replace(",","")
if final_feature.find('+') != -1:
final_feature = final_feature.replace("+","")
final_feature = float(final_feature)
elif final_feature.find("-") != -1:
final_feature = final_feature.replace("-","")
final_feature = float(final_feature)*-1
if final_feature == "N/A":
final_feature = 0
elif final_feature == "" or final_feature == " ":
final_feature = -1
country_data.append(final_feature)
all_data.append(country_data)
La Data extraída se transforma en DataFrame para ser procesada. Posteriormente, las columnas son etiquetadas y asignadas a un formato según corresponda.
df = pd.DataFrame(all_data)
df
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | World | 228907394 | 417242.0 | 4699141 | 6767.0 | 205498697 | 442879.0 | 18709556 | 99863 | 29367 | ... | -1 | -1 | -1 | All | \n | -1 | -1 | -1 | -1 | -1 |
| 1 | USA | 42866805 | 64559.0 | 691562 | 849.0 | 32483226 | 49060.0 | 9692017 | 24850 | 128591 | ... | 618987057 | 1856825 | 333357810 | North America | 8 | 482 | 1 | 194 | 3 | 29,074 |
| 2 | India | 33447010 | 31121.0 | 444869 | 306.0 | 32664351 | 39633.0 | 337790 | 8944 | 23951 | ... | 550780273 | 394413 | 1396457441 | Asia | 42 | 3139 | 3 | 22 | 0.2 | 242 |
| 3 | Brazil | 21230325 | 22789.0 | 590547 | 803.0 | 20280294 | 7574.0 | 359484 | 8318 | 99026 | ... | 57282520 | 267185 | 214392467 | South America | 10 | 363 | 4 | 106 | 4 | 1,677 |
| 4 | UK | 7400739 | 30144.0 | 135147 | 164.0 | 5958691 | 24673.0 | 1306901 | 1020 | 108327 | ... | 289860893 | 4242772 | 68318747 | Europe | 9 | 506 | 0 | 441 | 2 | 19,129 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 206 | Falkland Islands | 67 | -1.0 | -1 | -1.0 | 63 | -1.0 | 4 | -1 | 18601 | ... | 7531 | 2090783 | 3602 | South America | 54 | -1 | 0 | -1 | -1 | 1,110 |
| 207 | Montserrat | 32 | -1.0 | 1 | -1.0 | 29 | -1.0 | 2 | -1 | 6405 | ... | 1408 | 281825 | 4996 | North America | 156 | 4996 | 4 | -1 | -1 | 400 |
| 208 | Western Sahara | 10 | -1.0 | 1 | -1.0 | 8 | -1.0 | 1 | -1 | 16 | ... | -1 | -1 | 615102 | Africa | 61510 | 615102 | -1 | -1 | -1 | 2 |
| 209 | Palau | 5 | -1.0 | -1 | -1.0 | 2 | -1.0 | 3 | -1 | 275 | ... | 9380 | 515413 | 18199 | Australia/Oceania | 3640 | -1 | 2 | -1 | -1 | 165 |
| 210 | Total: | 228907394 | 417242.0 | 4699141 | 6767.0 | 205498697 | 442879.0 | 18709556 | 99863 | 29366.7 | ... | -1 | -1 | -1 | All | \n | -1 | -1 | -1 | -1 | -1 |
211 rows × 21 columns
df.drop([15, 16, 17, 18, 19, 20], inplace = True, axis = 1)
column_labels = ["País","Total Casos","Nuevos Casos","Total Muertes","Nuevas Muertes","Total Recuperados","Nuevos Recuperados",
"Casos Activos","Serios/Críticos","Total Casos/1M","Muertes/1M","Total Tests","Tests/1M","Población","Continente"]
df.columns = column_labels
df
| País | Total Casos | Nuevos Casos | Total Muertes | Nuevas Muertes | Total Recuperados | Nuevos Recuperados | Casos Activos | Serios/Críticos | Total Casos/1M | Muertes/1M | Total Tests | Tests/1M | Población | Continente | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | World | 228907394 | 417242.0 | 4699141 | 6767.0 | 205498697 | 442879.0 | 18709556 | 99863 | 29367 | 602.9 | -1 | -1 | -1 | All |
| 1 | USA | 42866805 | 64559.0 | 691562 | 849.0 | 32483226 | 49060.0 | 9692017 | 24850 | 128591 | 2075 | 618987057 | 1856825 | 333357810 | North America |
| 2 | India | 33447010 | 31121.0 | 444869 | 306.0 | 32664351 | 39633.0 | 337790 | 8944 | 23951 | 319 | 550780273 | 394413 | 1396457441 | Asia |
| 3 | Brazil | 21230325 | 22789.0 | 590547 | 803.0 | 20280294 | 7574.0 | 359484 | 8318 | 99026 | 2755 | 57282520 | 267185 | 214392467 | South America |
| 4 | UK | 7400739 | 30144.0 | 135147 | 164.0 | 5958691 | 24673.0 | 1306901 | 1020 | 108327 | 1978 | 289860893 | 4242772 | 68318747 | Europe |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 206 | Falkland Islands | 67 | -1.0 | -1 | -1.0 | 63 | -1.0 | 4 | -1 | 18601 | -1 | 7531 | 2090783 | 3602 | South America |
| 207 | Montserrat | 32 | -1.0 | 1 | -1.0 | 29 | -1.0 | 2 | -1 | 6405 | 200 | 1408 | 281825 | 4996 | North America |
| 208 | Western Sahara | 10 | -1.0 | 1 | -1.0 | 8 | -1.0 | 1 | -1 | 16 | 2 | -1 | -1 | 615102 | Africa |
| 209 | Palau | 5 | -1.0 | -1 | -1.0 | 2 | -1.0 | 3 | -1 | 275 | -1 | 9380 | 515413 | 18199 | Australia/Oceania |
| 210 | Total: | 228907394 | 417242.0 | 4699141 | 6767.0 | 205498697 | 442879.0 | 18709556 | 99863 | 29366.7 | 602.9 | -1 | -1 | -1 | All |
211 rows × 15 columns
for label in df.columns:
if label != 'País' and label != "Continente":
df[label] = pd.to_numeric(df[label])
df
| País | Total Casos | Nuevos Casos | Total Muertes | Nuevas Muertes | Total Recuperados | Nuevos Recuperados | Casos Activos | Serios/Críticos | Total Casos/1M | Muertes/1M | Total Tests | Tests/1M | Población | Continente | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | World | 228907394 | 417242.0 | 4699141 | 6767.0 | 205498697 | 442879.0 | 18709556 | 99863 | 29367.0 | 602.9 | -1 | -1 | -1 | All |
| 1 | USA | 42866805 | 64559.0 | 691562 | 849.0 | 32483226 | 49060.0 | 9692017 | 24850 | 128591.0 | 2075.0 | 618987057 | 1856825 | 333357810 | North America |
| 2 | India | 33447010 | 31121.0 | 444869 | 306.0 | 32664351 | 39633.0 | 337790 | 8944 | 23951.0 | 319.0 | 550780273 | 394413 | 1396457441 | Asia |
| 3 | Brazil | 21230325 | 22789.0 | 590547 | 803.0 | 20280294 | 7574.0 | 359484 | 8318 | 99026.0 | 2755.0 | 57282520 | 267185 | 214392467 | South America |
| 4 | UK | 7400739 | 30144.0 | 135147 | 164.0 | 5958691 | 24673.0 | 1306901 | 1020 | 108327.0 | 1978.0 | 289860893 | 4242772 | 68318747 | Europe |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 206 | Falkland Islands | 67 | -1.0 | -1 | -1.0 | 63 | -1.0 | 4 | -1 | 18601.0 | -1.0 | 7531 | 2090783 | 3602 | South America |
| 207 | Montserrat | 32 | -1.0 | 1 | -1.0 | 29 | -1.0 | 2 | -1 | 6405.0 | 200.0 | 1408 | 281825 | 4996 | North America |
| 208 | Western Sahara | 10 | -1.0 | 1 | -1.0 | 8 | -1.0 | 1 | -1 | 16.0 | 2.0 | -1 | -1 | 615102 | Africa |
| 209 | Palau | 5 | -1.0 | -1 | -1.0 | 2 | -1.0 | 3 | -1 | 275.0 | -1.0 | 9380 | 515413 | 18199 | Australia/Oceania |
| 210 | Total: | 228907394 | 417242.0 | 4699141 | 6767.0 | 205498697 | 442879.0 | 18709556 | 99863 | 29366.7 | 602.9 | -1 | -1 | -1 | All |
211 rows × 15 columns
df.to_csv ('all_data_covid-19.csv', index = False, header=True)
Con la Data final almacenada en df ya es posible generar nuevas variables o indicadores según se requiera, para luego realizar el análisis y la exploración gráfica.
A modo de ejemplo, se genera nuevas variables y algunos gráficos; primero de forma agregada, luego por continentes, después se realiza una segmentación por países y finalmente una revisión del los casos de Covid-19 en América del Sur.
df["%Inc Casos"] = df["Nuevos Casos"]/df["Total Casos"]*100
df["%Inc Muertes"] = df["Nuevas Muertes"]/df["Total Muertes"]*100
df["%Inc Recuperados"] = df["Nuevos Recuperados"]/df["Total Recuperados"]*100
df
| País | Total Casos | Nuevos Casos | Total Muertes | Nuevas Muertes | Total Recuperados | Nuevos Recuperados | Casos Activos | Serios/Críticos | Total Casos/1M | Muertes/1M | Total Tests | Tests/1M | Población | Continente | %Inc Casos | %Inc Muertes | %Inc Recuperados | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | World | 228907394 | 417242.0 | 4699141 | 6767.0 | 205498697 | 442879.0 | 18709556 | 99863 | 29367.0 | 602.9 | -1 | -1 | -1 | All | 0.182275 | 0.144005 | 0.215514 |
| 1 | USA | 42866805 | 64559.0 | 691562 | 849.0 | 32483226 | 49060.0 | 9692017 | 24850 | 128591.0 | 2075.0 | 618987057 | 1856825 | 333357810 | North America | 0.150604 | 0.122766 | 0.151032 |
| 2 | India | 33447010 | 31121.0 | 444869 | 306.0 | 32664351 | 39633.0 | 337790 | 8944 | 23951.0 | 319.0 | 550780273 | 394413 | 1396457441 | Asia | 0.093046 | 0.068784 | 0.121334 |
| 3 | Brazil | 21230325 | 22789.0 | 590547 | 803.0 | 20280294 | 7574.0 | 359484 | 8318 | 99026.0 | 2755.0 | 57282520 | 267185 | 214392467 | South America | 0.107342 | 0.135976 | 0.037347 |
| 4 | UK | 7400739 | 30144.0 | 135147 | 164.0 | 5958691 | 24673.0 | 1306901 | 1020 | 108327.0 | 1978.0 | 289860893 | 4242772 | 68318747 | Europe | 0.407311 | 0.121349 | 0.414067 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 206 | Falkland Islands | 67 | -1.0 | -1 | -1.0 | 63 | -1.0 | 4 | -1 | 18601.0 | -1.0 | 7531 | 2090783 | 3602 | South America | -1.492537 | 100.000000 | -1.587302 |
| 207 | Montserrat | 32 | -1.0 | 1 | -1.0 | 29 | -1.0 | 2 | -1 | 6405.0 | 200.0 | 1408 | 281825 | 4996 | North America | -3.125000 | -100.000000 | -3.448276 |
| 208 | Western Sahara | 10 | -1.0 | 1 | -1.0 | 8 | -1.0 | 1 | -1 | 16.0 | 2.0 | -1 | -1 | 615102 | Africa | -10.000000 | -100.000000 | -12.500000 |
| 209 | Palau | 5 | -1.0 | -1 | -1.0 | 2 | -1.0 | 3 | -1 | 275.0 | -1.0 | 9380 | 515413 | 18199 | Australia/Oceania | -20.000000 | 100.000000 | -50.000000 |
| 210 | Total: | 228907394 | 417242.0 | 4699141 | 6767.0 | 205498697 | 442879.0 | 18709556 | 99863 | 29366.7 | 602.9 | -1 | -1 | -1 | All | 0.182275 | 0.144005 | 0.215514 |
211 rows × 18 columns
Casos según:
cases = df[["Total Recuperados","Casos Activos","Total Muertes"]].loc[0]
cases_df = pd.DataFrame(cases).reset_index()
cases_df.columns = ["Tipo","Total"]
cases_df["Porcentaje"] = np.round(100*cases_df['Total']/np.sum(cases_df["Total"]),2)
cases_df["Virus"] = ["COVID—19" for i in range(len(cases_df))]
#print(cases_df)
fig = px.bar(cases_df, x = "Virus", y = "Porcentaje", color = "Tipo", hover_data = ["Total"])
fig.update_layout(title={'text': "Total de casos Covid-19 según tipo a nivel mundial<br><sup>(Sep 2021)</sup>",
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
font=dict(family="Franklin Gothic",
size = 14,
color="black")
)
note = 'Elaboración propia <br>Fuente: Datos de <a href="https://www.worldometers.info/coronavirus/#countries">Worldometers</a> (2021)'
fig.add_annotation(text=note,
font=dict(size=12),
align="left",
x=0.0,
y=-0.2,
xref="x domain",
yref="y domain",
showarrow=False,
)
fig.show()
cases = df[["Nuevos Recuperados","Nuevos Casos","Nuevas Muertes"]].loc[0]
cases_df = pd.DataFrame(cases).reset_index()
cases_df.columns = ["Tipo","Total"]
cases_df["Porcentaje"] = np.round(100*cases_df['Total']/np.sum(cases_df["Total"]),2)
cases_df["Virus"] = ["COVID—19" for i in range(len(cases_df))]
#print(cases_df)
fig = px.pie(cases_df, names = "Tipo", values = "Porcentaje", hover_data = ["Total"])
fig.update_layout(title={'text': "Nuevos casos de Covid-19 según tipo a nivel mundial (%)<br><sup>(Sep 2021)</sup>",
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
font=dict(family="Franklin Gothic",
size = 14,
color="black")
)
note = 'Elaboración propia <br>Fuente: Datos de <a href="https://www.worldometers.info/coronavirus/#countries">Worldometers</a> (2021)'
fig.add_annotation(text=note,
font=dict(size=12),
align="left",
x=0.0,
y=-0.20,
xref="x domain",
yref="y domain",
showarrow=False,
)
fig.show()
per = np.round(df[["%Inc Casos","%Inc Muertes","%Inc Recuperados"]].loc[0],2)
per_df = pd.DataFrame(per)
per_df.columns = ["Porcentaje"]
fig = go.Figure()
fig.add_trace(go.Bar(x = per_df.index, y = per_df["Porcentaje"], marker_color = ["cyan","orange", "limegreen"]))
fig.update_layout(title={'text': "Incremento de casos Covid-19 a nivel mundial<br><sup>(Sep 2021)</sup>",
'y':0.85,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
font=dict(family="Franklin Gothic",
size = 14,
color="black")
)
note = 'Elaboración propia <br>Fuente: Datos de <a href="https://www.worldometers.info/coronavirus/#countries">Worldometers</a> (2021)'
fig.add_annotation(text=note,
font=dict(size=12),
align="left",
x=0.0,
y=-0.20,
xref="x domain",
yref="y domain",
showarrow=False,
)
fig.show()
continent_df = df.groupby("Continente").sum().drop("All")
continent_df = continent_df.reset_index()
continent_df
| Continente | Total Casos | Nuevos Casos | Total Muertes | Nuevas Muertes | Total Recuperados | Nuevos Recuperados | Casos Activos | Serios/Críticos | Total Casos/1M | Muertes/1M | Total Tests | Tests/1M | Población | %Inc Casos | %Inc Muertes | %Inc Recuperados | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Africa | 8221512 | 11301.0 | 206311 | 327.0 | 7489045 | 24562.0 | 526156 | 3664 | 916666.0 | 13777.8 | 70630985 | 7561110 | 1379412756 | -4.368717 | -115.605858 | -5.386255 |
| 1 | Asia | 73831636 | 176397.0 | 1090762 | 2625.0 | 69763800 | 229824.0 | 2977074 | 37876 | 2029799.0 | 20466.0 | 1175839048 | 52655876 | 3200810461 | 20.134167 | -19.093720 | 22.626355 |
| 2 | Australia/Oceania | 200100 | 2596.0 | 2553 | 4.0 | 154350 | 1608.0 | 43196 | 422 | 215324.0 | 2847.0 | 39314378 | 3230372 | 41507126 | -0.475763 | 108.570544 | -46.734119 |
| 3 | Europe | 57540010 | 113110.0 | 1202305 | 1498.0 | 52662474 | 101709.0 | 3675231 | 11837 | 4213858.0 | 73142.0 | 1255310015 | 108928847 | 748177675 | 10.189493 | -60.275798 | 8.764100 |
| 4 | North America | 51470947 | 85275.0 | 1045928 | 1213.0 | 39730619 | 71022.0 | 10694398 | 32350 | 2141597.0 | 28643.0 | 697246427 | 43120590 | 594701585 | 17.820723 | -34.419806 | 2.246732 |
| 5 | South America | 37529161 | 28453.0 | 1146495 | 1002.0 | 33869412 | 14031.0 | 545810 | 13649 | 958335.0 | 26082.0 | 158833320 | 8483318 | 435070003 | 0.651557 | 101.864553 | -0.171588 |
note = 'Elaboración propia <br>Fuente: Datos de <a href="https://www.worldometers.info/coronavirus/#countries">Worldometers</a> (2021)'
def continent_visualization(vis_list):
for label in vis_list:
c_df = continent_df[['Continente', label]]
c_df["Porcentaje"] = np.round(100*c_df[label]/np.sum(c_df[label]),2)
c_df["Virus"] = ['Covid — 19' for i in range(len(c_df))]
fig = px.bar(c_df,
x= 'Continente',
y= 'Porcentaje',
color= 'Continente',
hover_data=[label])
fig.update_layout(title={'text':f"{label} <br><sup>(Actualizado al {yesterday_str})</sup>",
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
font=dict(family="Franklin Gothic",
size = 14,
color="black")
)
fig.add_annotation(text=note,
font=dict(size=12),
align="left",
x=0.0,
y=-0.20,
xref="x domain",
yref="y domain",
showarrow=False,
)
fig.show()
gc.collect()
cases_list = ["Total Casos","Casos Activos", "Nuevos Casos", "Serios/Críticos", "Total Casos/1M", "%Inc Casos"]
deaths_list = ["Total Muertes","Nuevas Muertes", "Muertes/1M", "%Inc Muertes"]
recorvered_list = ["Total Recuperados", "Nuevos Recuperados", "%Inc Recuperados" ]
continent_visualization(cases_list)
continent_visualization(deaths_list)
continent_visualization(recorvered_list)
df = df.drop(len(df)-1)
country_df = df.drop([0])
country_df
| País | Total Casos | Nuevos Casos | Total Muertes | Nuevas Muertes | Total Recuperados | Nuevos Recuperados | Casos Activos | Serios/Críticos | Total Casos/1M | Muertes/1M | Total Tests | Tests/1M | Población | Continente | %Inc Casos | %Inc Muertes | %Inc Recuperados | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | USA | 42866805 | 64559.0 | 691562 | 849.0 | 32483226 | 49060.0 | 9692017 | 24850 | 128591.0 | 2075.0 | 618987057 | 1856825 | 333357810 | North America | 0.150604 | 0.122766 | 0.151032 |
| 2 | India | 33447010 | 31121.0 | 444869 | 306.0 | 32664351 | 39633.0 | 337790 | 8944 | 23951.0 | 319.0 | 550780273 | 394413 | 1396457441 | Asia | 0.093046 | 0.068784 | 0.121334 |
| 3 | Brazil | 21230325 | 22789.0 | 590547 | 803.0 | 20280294 | 7574.0 | 359484 | 8318 | 99026.0 | 2755.0 | 57282520 | 267185 | 214392467 | South America | 0.107342 | 0.135976 | 0.037347 |
| 4 | UK | 7400739 | 30144.0 | 135147 | 164.0 | 5958691 | 24673.0 | 1306901 | 1020 | 108327.0 | 1978.0 | 289860893 | 4242772 | 68318747 | Europe | 0.407311 | 0.121349 | 0.414067 |
| 5 | Russia | 7254754 | 20329.0 | 197425 | 799.0 | 6485264 | 16247.0 | 572065 | 2300 | 49687.0 | 1352.0 | 186100000 | 1274566 | 146010443 | Europe | 0.280216 | 0.404711 | 0.250522 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205 | Anguilla | 331 | -1.0 | -1 | -1.0 | 317 | -1.0 | 14 | -1 | 21827.0 | -1.0 | 38936 | 2567491 | 15165 | North America | -0.302115 | 100.000000 | -0.315457 |
| 206 | Falkland Islands | 67 | -1.0 | -1 | -1.0 | 63 | -1.0 | 4 | -1 | 18601.0 | -1.0 | 7531 | 2090783 | 3602 | South America | -1.492537 | 100.000000 | -1.587302 |
| 207 | Montserrat | 32 | -1.0 | 1 | -1.0 | 29 | -1.0 | 2 | -1 | 6405.0 | 200.0 | 1408 | 281825 | 4996 | North America | -3.125000 | -100.000000 | -3.448276 |
| 208 | Western Sahara | 10 | -1.0 | 1 | -1.0 | 8 | -1.0 | 1 | -1 | 16.0 | 2.0 | -1 | -1 | 615102 | Africa | -10.000000 | -100.000000 | -12.500000 |
| 209 | Palau | 5 | -1.0 | -1 | -1.0 | 2 | -1.0 | 3 | -1 | 275.0 | -1.0 | 9380 | 515413 | 18199 | Australia/Oceania | -20.000000 | 100.000000 | -50.000000 |
209 rows × 18 columns
LOOK_AT = 5
country = country_df.columns[1:14]
fig = go.Figure()
c = 0
for i in country_df.index:
if c < LOOK_AT:
fig.add_trace(go.Bar(name= country_df['País'][i], x= country, y= country_df.loc[i][1:14]))
else:
break
c +=1
fig.update_layout(title = {'text':f'Top {LOOK_AT} de países a nivel mundial con casos de Covid-19<br><sup>(Actualizado al {yesterday_str})</sup>',
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top' },
yaxis_type = "log",
legend_title="Países",
font=dict(family="Franklin Gothic",
size = 14,
color="black")
)
note = 'Elaboración propia <br>Fuente: Datos de <a href="https://www.worldometers.info/coronavirus/#countries">Worldometers</a> (2021)'
fig.add_annotation(text=note,
font=dict(size=12),
align="left",
x=0.0,
y=-0.25,
xref="x domain",
yref="y domain",
showarrow=False,
)
fig.show()
south_df = country_df.loc[country_df["Continente"] == "South America"].reset_index()
south_df = south_df.drop(columns=["index"])
print("Dimension of table",south_df.shape)
south_df
Dimension of table (14, 18)
| País | Total Casos | Nuevos Casos | Total Muertes | Nuevas Muertes | Total Recuperados | Nuevos Recuperados | Casos Activos | Serios/Críticos | Total Casos/1M | Muertes/1M | Total Tests | Tests/1M | Población | Continente | %Inc Casos | %Inc Muertes | %Inc Recuperados | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Brazil | 21230325 | 22789.0 | 590547 | 803.0 | 20280294 | 7574.0 | 359484 | 8318 | 99026.0 | 2755.0 | 57282520 | 267185 | 214392467 | South America | 0.107342 | 0.135976 | 0.037347 |
| 1 | Argentina | 5238610 | 1451.0 | 114367 | 81.0 | 5093351 | 2902.0 | 30892 | 1516 | 114634.0 | 2503.0 | 24252818 | 530711 | 45698752 | South America | 0.027698 | 0.070825 | 0.056976 |
| 2 | Colombia | 4939251 | 1655.0 | 125860 | 34.0 | 4777796 | 1627.0 | 35595 | 542 | 95832.0 | 2442.0 | 25011054 | 485270 | 51540486 | South America | 0.033507 | 0.027014 | 0.034053 |
| 3 | Peru | 2166419 | 886.0 | 198976 | 28.0 | 0 | 0.0 | 0 | 1034 | 64614.0 | 5935.0 | 17463391 | 520853 | 33528461 | South America | 0.040897 | 0.014072 | NaN |
| 4 | Chile | 1646994 | 591.0 | 37339 | 21.0 | 1603507 | 501.0 | 6148 | 445 | 85269.0 | 1933.0 | 21193085 | 1097222 | 19315228 | South America | 0.035884 | 0.056241 | 0.031244 |
| 5 | Ecuador | 505860 | -1.0 | 32559 | -1.0 | 443880 | -1.0 | 29421 | 759 | 28154.0 | 1812.0 | 1798012 | 100069 | 17967648 | South America | -0.000198 | -0.003071 | -0.000225 |
| 6 | Bolivia | 496700 | 346.0 | 18648 | 20.0 | 450400 | 847.0 | 27652 | 220 | 41859.0 | 1572.0 | 2353306 | 198324 | 11865993 | South America | 0.069660 | 0.107250 | 0.188055 |
| 7 | Paraguay | 459622 | 42.0 | 16126 | 3.0 | 441547 | 185.0 | 1949 | 36 | 63495.0 | 2228.0 | 1814736 | 250698 | 7238742 | South America | 0.009138 | 0.018603 | 0.041898 |
| 8 | Uruguay | 387555 | 106.0 | 6048 | 2.0 | 379883 | 144.0 | 1624 | 11 | 111100.0 | 1734.0 | 3497016 | 1002487 | 3488342 | South America | 0.027351 | 0.033069 | 0.037906 |
| 9 | Venezuela | 353401 | -1.0 | 4275 | -1.0 | 337230 | -1.0 | 11896 | 681 | 12471.0 | 151.0 | 3359014 | 118534 | 28338074 | South America | -0.000283 | -0.023392 | -0.000297 |
| 10 | French Guiana | 38266 | -1.0 | 235 | -1.0 | 9995 | -1.0 | 28036 | 34 | 124223.0 | 763.0 | 382427 | 1241473 | 308043 | South America | -0.002613 | -0.425532 | -0.010005 |
| 11 | Suriname | 36746 | 318.0 | 803 | 7.0 | 26488 | 97.0 | 9455 | 22 | 61966.0 | 1354.0 | 120401 | 203035 | 593007 | South America | 0.865400 | 0.871731 | 0.366204 |
| 12 | Guyana | 29345 | 273.0 | 713 | 7.0 | 24978 | 158.0 | 3654 | 32 | 37091.0 | 901.0 | 298009 | 376674 | 791158 | South America | 0.930312 | 0.981767 | 0.632557 |
| 13 | Falkland Islands | 67 | -1.0 | -1 | -1.0 | 63 | -1.0 | 4 | -1 | 18601.0 | -1.0 | 7531 | 2090783 | 3602 | South America | -1.492537 | 100.000000 | -1.587302 |
LOOK_AT = 5
south = south_df.columns[1:14]
fig = go.Figure()
c = 0
for i in south_df.index:
if c < LOOK_AT:
fig.add_trace(go.Bar(name= south_df['País'][i], x= country, y= south_df.loc[i][1:14]))
else:
break
c +=1
fig.update_layout(title = {'text':f'Top {LOOK_AT} de países de América del Sur por casos de Covid-19<br><sup>(Actualizado al {yesterday_str})</sup>',
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top' },
yaxis_type = "log",
legend_title="Países",
font=dict(family="Franklin Gothic",
size = 14,
color="black")
)
note = 'Elaboracion Propia <br>Fuente: Datos de <a href="https://www.worldometers.info/coronavirus/#countries">Worldometers</a> (2021)'
fig.add_annotation(text=note,
font=dict(size=12),
align="left",
x=0.0,
y=-0.25,
xref="x domain",
yref="y domain",
showarrow=False,
)
fig.show()
south_df1 = south_df.loc[south_df["Muertes/1M"] > 0]
fig = px.scatter(south_df1, x="Total Casos", y="Población",
size="Muertes/1M", color="País",
hover_name="País", log_x=True, size_max=60)
fig.update_layout(
title={
'text': "Población vs Total de casos Covid-19 en Sudamérica <br><sup>(Tamaño determinado por Muertes por millón, Sep 2021)</sup> ",
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
legend_title="Países",
font=dict(family="Franklin Gothic",
size = 13,
color="black")
)
note = 'Elaboración propia <br>Fuente: Datos de <a href="https://www.worldometers.info/coronavirus/#countries">Worldometers</a> (2021)'
fig.add_annotation(text=note,
font=dict(size=12),
align="left",
x=0.0,
y=-0.2,
xref="x domain",
yref="y domain",
showarrow=False,
)
fig.show()